--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: real-time-report rmarkdown_html_fragment: true update: 2020-05-19 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski ---

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Gaussian Process (GP) to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CrI of fitted GP.

Adjusted symptomatic case estimates

Figure 2: Estimated number of new symptomatic cases, calculated using our temporal under-reporting estimates. We adjust the reported case numbers each day - for each country with an under-reporting estimate - using our temporal under-reporting estimates to arrive at an estimate of the true number of symptomatic cases each day. The shaded blue region represents the 95% CrI, calcuated directly using the 95% CrI of the temporal under-reporting estimate.

Reported cases

Figure 3: Reported number of cases each day, pulled from the ECDC and plotted against time for comparison with our estimated true numbers of symptomatic cases each day, adjusted using our under-reporting estimates.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Afghanistan 39% (28%-52%) 7,072 173
Albania 84% (45%-100%) 948 31
Algeria 33% (22%-47%) 7,201 555
Andorra 20% (8.4%-33%) 761 51
Argentina 25% (19%-33%) 8,358 382
Armenia 71% (49%-97%) 4,823 61
Australia 88% (58%-100%) 7,060 99
Austria 56% (35%-83%) 16,179 629
Azerbaijan 85% (59%-100%) 3,287 40
Bahamas 52% (13%-99%) 96 11
Bangladesh 77% (45%-100%) 23,870 349
Belarus 99% (93%-100%) 30,572 171
Belgium 13% (10%-16%) 55,559 9,080
Bolivia 23% (17%-31%) 4,263 174
Bosnia and Herzegovina 14% (8%-23%) 2,303 132
Brazil 15% (13%-18%) 254,220 16,792
Bulgaria 24% (17%-35%) 2,259 112
Burkina Faso 29% (16%-56%) 796 51
Cameroon 81% (54%-99%) 3,292 140
Canada 17% (14%-20%) 78,061 5,842
Chad 11% (5.7%-19%) 519 53
Chile 86% (63%-100%) 46,059 478
China 98% (76%-100%) 84,063 4,638
Colombia 35% (27%-45%) 16,295 592
Congo 42% (16%-91%) 412 15
Cote dIvoire 82% (52%-100%) 2,119 28
Croatia 23% (12%-38%) 2,228 95
Cuba 43% (25%-72%) 1,881 79
Cyprus 83% (50%-100%) 917 17
Czechia 35% (25%-45%) 8,586 297
Democratic Republic of the Congo 28% (17%-44%) 1,629 61
Dominican Republic 62% (47%-77%) 12,725 434
Ecuador 5.8% (4.1%-7.1%) 33,582 2,799
Egypt 32% (24%-40%) 12,764 645
El Salvador 42% (24%-67%) 1,413 30
Estonia 34% (21%-52%) 1,784 64
Finland 35% (21%-54%) 6,380 300
France 6% (4.8%-7.5%) 142,903 28,239
Georgia 61% (23%-99%) 702 12
Germany 30% (23%-37%) 175,210 8,007
Ghana 96% (84%-100%) 5,735 29
Greece 24% (14%-37%) 2,836 165
Guatemala 46% (29%-71%) 2,001 38
Guernsey 44% (14%-95%) 252 13
Guinea 91% (69%-100%) 2,796 16
Guyana 57% (13%-100%) 124 10
Haiti 20% (8.1%-50%) 533 21
Honduras 28% (18%-44%) 2,798 146
Hungary 11% (7.8%-16%) 3,556 467
Iceland 87% (56%-100%) 1,802 10
India 37% (30%-45%) 101,139 3,163
Indonesia 14% (11%-19%) 18,010 1,191
Iran 42% (34%-51%) 122,492 7,057
Iraq 47% (25%-81%) 3,554 127
Ireland 30% (22%-40%) 24,200 1,547
Isle of Man 20% (8%-60%) 335 24
Israel 80% (58%-98%) 16,621 272
Italy 13% (11%-15%) 225,886 32,007
Japan 14% (11%-19%) 16,365 763
Jersey 15% (6.7%-38%) 303 27
Kazakhstan 97% (87%-100%) 6,751 35
Kenya 15% (8.9%-22%) 912 50
Kosovo 53% (30%-89%) 955 29
Kuwait 87% (67%-100%) 15,691 118
Kyrgyzstan 76% (41%-100%) 1,243 14
Latvia 55% (27%-95%) 1,009 19
Lebanon 63% (33%-100%) 931 26
Liberia 47% (11%-99%) 229 22
Lithuania 33% (18%-51%) 1,547 59
Luxembourg 46% (26%-66%) 3,947 107
Malaysia 95% (74%-100%) 6,941 113
Mali 19% (12%-28%) 874 52
Mauritius 61% (17%-100%) 332 10
Mexico 8.9% (7.2%-11%) 51,633 5,332
Moldova 29% (22%-38%) 6,138 215
Morocco 97% (85%-100%) 6,952 192
Netherlands 19% (15%-24%) 44,141 5,694
New Zealand 49% (23%-89%) 1,153 21
Niger 15% (7.1%-27%) 909 55
Nigeria 42% (29%-59%) 6,175 191
North Macedonia 24% (16%-36%) 1,817 104
Norway 54% (30%-86%) 8,249 233
Oman 96% (82%-100%) 5,379 26
Pakistan 53% (42%-66%) 43,966 939
Panama 52% (37%-69%) 9,726 279
Paraguay 87% (57%-100%) 788 11
Peru 31% (25%-37%) 94,933 2,789
Philippines 22% (17%-28%) 12,718 831
Poland 29% (22%-36%) 18,885 936
Portugal 40% (31%-51%) 29,209 1,231
Puerto Rico 38% (25%-57%) 2,710 124
Qatar 89% (53%-100%) 33,969 15
Romania 19% (15%-24%) 17,036 1,107
Russia 93% (83%-100%) 290,678 2,722
San Marino 80% (43%-100%) 654 41
Saudi Arabia 99% (93%-100%) 57,345 320
Senegal 77% (49%-100%) 2,544 26
Serbia 88% (61%-100%) 10,699 231
Sierra Leone 12% (7.1%-19%) 519 33
Singapore 93% (68%-100%) 28,343 22
Sint Maarten 12% (4.1%-33%) 77 15
Slovakia 65% (37%-97%) 1,495 28
Slovenia 17% (11%-29%) 1,466 104
Somalia 45% (24%-73%) 1,455 57
South Africa 51% (38%-67%) 16,433 286
South Korea 43% (16%-82%) 11,078 263
Spain 15% (12%-18%) 231,606 27,709
Sudan 31% (21%-47%) 2,591 105
Sweden 14% (11%-17%) 30,377 3,698
Switzerland 24% (19%-31%) 30,514 1,602
Tajikistan 23% (15%-34%) 1,729 41
Thailand 77% (51%-100%) 3,031 56
Togo 58% (18%-99%) 330 12
Tunisia 61% (27%-99%) 1,043 46
Turkey 66% (53%-79%) 150,593 4,171
Ukraine 32% (24%-43%) 18,616 535
United Arab Emirates 94% (80%-100%) 24,190 224
United Kingdom 18% (15%-22%) 246,406 34,796
United Republic of Tanzania 49% (25%-88%) 509 21
United States of America 25% (20%-29%) 1,508,598 90,353
Uruguay 44% (21%-84%) 737 20
Uzbekistan 92% (70%-100%) 2,802 13
Venezuela 84% (48%-100%) 618 10

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Gaussian Process (GP) model using the library greta and greta.gp. The parameters we fit and their priors are the following: \[ \begin{aligned} &\sigma \sim \text{Log Normal(-1, 1)}: \quad &\text{Variance of the reporting kernel} \\ &\text{L} \sim \text{Log Normal(4, 0.5)}: \quad &\text{Lengthscale of the reporting kernel} \\ &\sigma_{\text{obs}} \sim \text{Truncated Normal(0, 0.5)}, \quad &\text{Variance of the obseration kernel, truncated at 0} \end{aligned} \] The kernel is split into two components: the reporting kernel \(R\), and the observation kernel \(O\). The reporting component has a standard squared-exponential form. For the observation component, we use an i.i.d. noise kernel to acccount for observation overdispersion, which can smooth out overly clumped death time-series. This is important as some countries have been known to report an unusually large number of deaths on a single day, due to past under-reporting.

In the sampling and fitting process, we calculate the expected number of deaths at each time-point, given the baseline CFR. We then use a Poisson likelihood, where the expected number of deaths is the rate of the Poisson likelihood, given the observed number of deaths

Adjusting case counts for under-reporting

We adjust the reported number of cases each day, pulled from the ECDC. Specifically, we divide the case numbers of each day by our “proportion of cases reported” estimates that we calculate each day for each country.*

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

Acknowledgements

The authors, on behalf of the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) COVID-19 working group, wish to thank DSTL for providing the High Performance Computing facilities and associated expertise that has enabled these models to be prepared, run and processed and in an appropriately-rapid and highly efficient manner.

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.